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3 Abstract. Data assimilation is being increasingly used to merge remotely sensed 

4 land surface variables such as soil moisture, snow and skin temperature with es- 
s timates from land models. Its success, however, depends on unbiased model pre- 
e dictions and unbiased observations. Here, a suite of continental-scale, synthetic 

7 soil moisture assimilation experiments is used to compare two approaches that 
s address typical biases in soil moisture prior to data assimilation: (i) parameter 
s estimation to calibrate the land model to the climatology of the soil moisture 

10 observations, and (ii) scaling of the observations to the model’s soil moisture 

11 climatology. To enable this research, an optimization infrastructure was added 

12 to the NASA Land Information System (LIS) that includes gradient-based op- 

13 timization methods and global, heuristic search algorithms. The land model cal- 

M ibration eliminates the bias but does not necessarily result in more realistic model 
is parameters. Nevertheless, the experiments confirm that model calibration yields 
is assimilation estimates of surface and root zone soil moisture that are as skill- 

17 ful as those obtained through scaling of the observations to the model’s clima- 

18 tology. Analysis of innovation diagnostics underlines the importance of address- 

19 ing bias in soil moisture assimilation and confirms that both approaches ade- 

20 quately address the issue. 
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1. Introduction 

21 Land data assimilation systems merge satellite or in situ observations of land surface fields 

22 (such as soil moisture, snow and skin temperature) with estimates from land surface models. 

23 Observations are often discontinuous in space and time, and their incorporation into the modeled 

24 estimates helps generate spatially complete and temporally continuous estimates of land surface 

25 fields. The process of combining observations and model forecasts is typically carried out by 

26 weighting each based on their respective errors. The uncertainty in model states results from 

27 model structural deficiencies, errors in model parameter specifications and input forcings. Simi- 

28 larly, observational data also suffer from errors caused by instrument noise and errors associated 

29 with the retrieval models. A key assumption in most data assimilation techniques is that the errors 
so in observations and model forecasts are strictly random and that on average, the observations 

31 and model estimates agree with the true estimates. In reality, however, biases are unavoidable 

32 and it is difficult to attribute the bias to the model or the observations. Nevertheless, the proper 

33 treatment of such systematic errors is critical for the success of data assimilation systems ( Dee 

34 and da Silva [ 1998 ]). 

35 A number of prior studies have described techniques to address the treatment of bias errors in 

36 data assimilation systems. Dee [ 2005 ] characterizes the data assimilation systems as either “bias- 

37 blind” or “bias-aware”, based on their treatment of systematic errors. The bias-blind systems 

38 are designed to correct random, zero-mean errors and assume the use of unbiased observations 

39 relative to the model-generated background. For soil moisture, the absolute levels of continental- 

40 scale estimates from land surface models and satellite observations differ significantly ( Reichle 

41 et al. [ 2004 , 2007 ]), which implies a need for “bias-aware” approaches to soil moisture assimi- 
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42 lation. An often used method to address such biases is to rescale the observations prior to data 

43 assimilation in such a way that the observational climatology matches that of the land model 

44 ( Reichle and Koster [2004]; Drusch et al. [2005]; Crow et al. [2005]; Slater and Clark [2006]; 

45 Reichle et al. [2007]; Draper et al. [2009]; Kumar et al. [2009]; Reichle et al. [2010]; Liu et al. 

46 [201 1]; Draper et al. [201 1]). Put differently, these so-called “a priori scaling” approaches as- 

47 similate normalized deviates or percentiles instead of the raw observations. A priori scaling is 

48 easy to implement as a preprocessing step to the data assimilation system and does not make 

49 assumptions about whether the climatology of the model or that of the observations is more 
so correct. Although the resulting analyses are produced in the model’s climatology, they can be 

51 scaled back to the observational climatology, if needed. However, since the computation of the 

52 climatologies is conducted as a pre-processing step, the corrections cannot easily be adjusted to 

53 dynamic changes in bias. 

54 Dynamically bias-aware assimilation systems, on the other hand, incorporate specific assump- 

55 tions about the nature of biases and are specifically built to estimate and correct them. These 
se strategies typically attribute the bias to either the model or the observations and use the analy- 
s? sis increments in the data assimilation system to estimate the bias. Variants of such dynamic 

58 bias correction strategies have been used in soil moisture assimilation studies ( De Lannoy et al. 

59 [2007a, b]) and for land surface temperature assimilation by Bosilovich et al. [2007] and Reichle 
so et al. [2010], In these studies, the observations are assumed to be unbiased, and the bias is 
si attributed to model exclusively. In reality, however, the retrievals from different sensors may be 
62 biased against each other (Reichle et al. [2007]; Trigo and Viterbo [2003]). The key advantage 
es of the dynamic bias estimation and correction approaches is their ability to adapt to transient 
64 changes in bias. 
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es In this article, we explore an alternative strategy for a priori bias correction that has not been 
ee used for continental-scale soil moisture assimilation: the apriori calibration of land surface model 
e? (LSM) parameters. We use optimization algorithms to estimate model parameters that minimize 
ee the bias between model forecasts and observations. Similar to the a priori scaling methods 
es discussed above, the a priori calibration approach complements the state update steps of the 

70 data assimilation system. In the latter, the model forecast is modified only when observations 

71 are present. In the absence of observational information, the model will revert back to its 

72 original climatology. Adjusting model parameters offers a way to bring the model’s climatology 

73 in line with that of the observations, including at times and locations where observations are 

74 intermittently absent. Like a priori scaling, a priori model calibration does not adjust dynamically 

75 to changes in model or observation bias. 

76 Model parameters have long been recognized as a key source of errors in model predictions, 

77 and many LSM studies have focused on the application of techniques to estimate them (Duan 

78 etal. [1992]; Burke et al. [1997]; Gupta et al. [1999]; Hogue et al. [2005] ;Liu etal. [2004,2005]; 

79 Santanello et al. [2007]; Peters-Lidard et al. [2008]; Lambot et al. [2009]; Gutman and Small 
so [2010]; Nearing et al. [2010]). These studies estimate LSM parameters using independent 
si observations of variables such as soil moisture, streamflow and surface temperature. In addition, 
82 data assimilation studies have also recognized the need to update and estimate model parameters 
as for improving the model’s predictive skills. A number of studies have examined the potential 

84 of parameter estimation in conjunction with state estimation in sequential data assimilation 

85 systems ( Boulet et al. [2002]; Moradkhani et al. [2005]). These approaches, known as joint 

86 estimation or state augmentation methods, estimate the model parameters concurrently with 

87 the model states. Such approaches, however, have difficulties in handling the relative time- 
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invariance of parameters (compared to model states) and very large parameter spaces ( Liu and 
Gupta [2007]). De Lannoy et al. [2007a] note that in some situations it may be better to estimate 
the bias separately rather than correct it using state augmentation methods. An approach that 
employs the simultaneous use of optimization and data assimilation was described by Vrugt 
et al. [2005], where the model parameters are estimated through the recursive calibration over 
a data assimilation instance. This method considers the estimation of model parameter sets for 
generating the best possible forecasts, when model states are also adjusted through sequential 
data assimilation. The advantages and limitations of these joint state and parameter estimation 
approaches are discussed in detail in Liu and Gupta [2007]. 

Here we compare, in the context of data assimilation, the approach of bias mitigation through 
the estimation of model parameters against a priori bias correction strategies that rescale the 
observations to conform to the model’s climatology. The parameter estimation is performed in 
a “batch-calibration” mode, where a set of observational data is used to estimate time-invariant 
model parameters with the objective of minimizing the climatological differences between the 
model and the observations. The model with the calibrated parameters is subsequently employed 
in the data assimilation system to assimilate the raw, unsealed observations. In contrast, the scal- 
ing approaches essentially assimilate the anomaly information instead of the raw observations. 
We investigate these methods with a soil moisture assimilation case study. A new generation of 
satellite soil moisture retrievals are becoming available from the recently launched Soil Moisture 
and Ocean Salinity (SMOS; Kerr et al. [2010]) and the planned Soil Moisture Active Passive 
(SMAP; Entekhabi et al. [2010b]) missions. The results from our study are directly relevant to 
the effective utilization of these new observations in land data assimilation systems. 
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The experiments presented in this paper are conducted using the NASA Land Information 
System (LIS; Kumar et al. [2006]; Peters-Lidard et al. [2007]), which is a multiscale modeling 
system for hydrologic applications developed with the goal of integrating satellite- and ground- 
based observational data products and advanced land surface models and techniques to generate 
improved estimates of land surface conditions. LIS includes a suite of subsystems to support 
land surface modeling for a variety of applications, including a comprehensive sequential data 
assimilation system, based on the NASA Global Modeling and Assimilation Office’s infras- 
tructure ( Reichle et al. [2009]; Kumar et al. [2008b]). More recently, a generic optimization 
subsystem has been developed within LIS, with the goal of combining the use of optimization 
and data assimilation in an integrated framework. This new extension to LIS will be described 
in detail below and was used to facilitate the experiments discussed here. 

The paper is organized as follows. The design and capabilities of the optimization subsystem 
within LIS are presented first (Section 2). This is followed by the description of the experiment 
setup that evaluates the use of parameter estimation in data assimilation (Section 3). The results 
from the data assimilation integrations are presented in Section 4. Finally, Section 5 discusses 
the conclusions from the study. 

2. Optimization subsystem in LIS 

LIS is designed as an object-oriented framework, where all functional extensions (such as 
land surface models, data assimilation algorithms, meteorological inputs, observational data, 
etc.) are implemented as abstract, extensible components ( Kumar et al. [2006, 2008a]). A large 
suite of modeling extensions have been incorporated in LIS using this design paradigm. The 
optimization subsystem in LIS is designed in a similar interoperable manner. 
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2.1. Optimization abstractions 

Generically, an optimization instance can be stated as a problem of determining unknown 
parameters by minimizing or maximizing an objective function subject to a number of constraints. 
The optimization subsystem in LIS defines three functional abstractions based on this generic 
form, shown in Figure 1: (1) objective function, (2) decision/parameter space and (3) algorithm 
used to solve the optimization problem. In the instance of parameter estimation, the decision 
space is defined by the list of LSM parameters (or a subset thereof). The objective function 
object represents the function or criteria to be maximized or minimized. Examples include the 
minimization of squared residuals and the maximization of likelihood measures. Finally, the 
optimization algorithm abstraction represents the actual search strategy used to find the optimal 
solution. The interconnections between these three generic pieces are handled within the LIS 
core, which is the unit that enables the integrated use of various extensible components in LIS. 
Custom implementations of each of these three abstractions constitute a specific instance of an 
optimization problem. 

Similar to the design of the LIS data assimilation subsystem ( Kumar et al. [2008b]), the data 
exchanges between these abstractions are handled through the constructs of the Earth System 
Modeling Framework (ESMF; Hill etal. [2004]). ESMF provides a standardized, self-describing 
format for data exchange between these components. Three search algorithms of varying com- 
plexity are implemented in this infrastructure: ( 1 ) Levenberg-Marquardt (LM; Levenberg [ 1 944] ; 
Marquardt [1963]) (2) Shuffled Complex Evolution from University of Arizona (SCE-UA; Duan 
et al. [1992, 1993]) and (3) Genetic Algorithm (GA; Holland [1975]). LM is a gradient-based 
search technique and is suited only for deterministic convex optimization problems, whereas 
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SCE-UA and GA are more suited for difficult combinatorial optimization problems such as 
LSM parameter estimation. 

2.2. Genetic Algorithm 

In this article, we employ GA for estimating LSM parameters. GAs are stochastic search 
techniques that use heuristics-based principles of natural evolution and genetics. The algorithm 
works by employing a population of individuals (or candidate solutions), each of which is 
represented by a set of values of the problem’s variables that need to be estimated (also called 
decision space). By applying operations that are based on natural evolution concepts, such 
as selection, recombination and mutation, the population evolves towards better solutions over 
several generations (or iterations). 

Figure 2 depicts a flow chart showing the sequence of GA operations during optimization. 
A fitness value that reflects the quality of the solution and its ability to satisfy constraints and 
objectives of the problem is associated with each potential solution. The selection operator 
simulates the “survival of the fittest” behavior by preferentially selecting the solutions with 
higher fitnesses to be present in the subsequent populations. As a result, solutions with good 
traits survive and solutions with bad traits are eliminated. Each pair of selected solutions then 
undergoes the recombination step where two new solutions are generated by combining the 
“genes” of the parent solutions. The mutation operator is used to infuse the population with gene 
values that may not be present in the population. The recombination and mutation rates define 
the probability of crossover between any two pairs and the probability of a gene undergoing 
mutation, respectively. To ensure that the best solution in any generation is not lost through 
these probabilistic recombination and mutation operations, a strategy named elitism is used. 
Elitism ensures that the best solution from the previous generation is compared with the worst 
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solution in the current generation, replacing the current generation’s solution, if better. These 
steps are repeated through several iterations (or generations) until the specified convergence 
criteria is met. 

GAs do not rely upon local or gradient information and are able to deal with complexities in 
the search space such as the presence of local optima and discontinuities. GAs are also well 
suited to handle discrete decision variables and nonlinearity in the simulation models effectively. 
The problem-independent structure of the algorithm has enabled its application in many areas 
of science and engineering ( Goldberg [1989]). GAs, however, require the evaluation of several 
simulation runs to obtain the best solution, making them computationally intensive. The high 
performance computing tools in LIS are employed for mitigating this limitation (section 4.3). 

3. Experimental Setup 
3.1. Experiment overview 

In this section, we describe a suite of synthetic data assimilation experiments that examines 
parameter estimation as an a priori bias mitigation scheme. In addition, two variants of the a priori 
scaling method are used: standard-normal deviate scaling ( Crow et al. [2005]) and cumulative 
distribution function (CDF) matching ( Reichle and Koster [2004]). The experiment setup is 
similar to that of Kumar et al. [2009], but only two land surface models are used here. The Noah 
land surface model (version 2.7.1; Ek et al. [2003]) employs the four-layer soil model of Mahrt 
and Pan [1984] with thicknesses (listed from top to bottom) of 10, 30, 60 and 100cm. In the 
Catchment LSM {Koster et al. [2000]), the vertical soil moisture profile is determined through 
deviations from the equilibrium soil moisture profile between the surface and the water table. 
Soil moisture in the 0-2 cm surface layer and in the 0- 1 00 cm root zone layer is diagnosed from the 
modeled soil moisture profile. The Catchment LSM typically employs hydrologically defined 
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catchments (or watersheds) as basic computational units. In this study, however, the Catchment 
LSM is used on a regular latitude-longitude grid to facilitate the model intercomparison. 

Using these land surface models, we conducted a suite of synthetic “fraternal twin” assimilation 
experiments. The basic structure of the experiments is as follows: First, a soil moisture simulation 
is conducted with the Catchment LSM to generate the assumed “true” state of the land surface, 
referred to as the control (or “truth”) run. Second, the observations to be assimilated are generated 
from this truth run by introducing realistic retrieval errors. Third, a suite of data assimilation 
integrations are conducted by assimilating these synthetic observations into the Noah land surface 
model, using different bias mitigation strategies. The Noah model integration without any data 
assimilation is referred to as the “open loop” simulation. The assimilation integrations are 
conducted using a one-dimensional Ensemble Kalman Filter (EnKF) algorithm (see Reichle and 
Koster [2003] for details on Id vs. 3d filtering). The performance of the assimilation approaches 
is evaluated by comparing against the known true fields (from the Catchment LSM integration). 

3.2. Experiment details 

All model simulations are conducted on a gridded domain that roughly covers the Continental 
United States (CONUS, from 30.5°N, 124.5°W to 50.5°N, 75.5°W) at 1° spatial resolution, using 
a 30 minute model timestep. Surface meteorological boundary conditions from the Global Data 
Assimilation System (GDAS; the global meteorological weather forecast model of the National 
Centers for Environmental Prediction ( Derber et al. [1991])) are used to drive the LSMs. The 
models are cycled three times through the period from 1 January 2000 to 1 January 2006 to ensure 
that internal model states are in equilibrium with the forcing meteorology and parameters. The 
initial conditions generated from this “spinup” process are used in the data assimilation and 
open loop integrations except those that use the optimized parameters. The optimization based 
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integrations use the soil moisture initial conditions estimated through calibration (section 3.3). 
All model and assimilation integrations are conducted over the above-mentioned six year period. 

Each open loop or assimilation experiment with the Noah LSM consists of 12 ensemble 
members (Kumar et al. [2008b]), and the mean of the ensemble is used in the evaluations. In 
order to maintain an ensemble of model fields representing the uncertainty in soil moisture, 
perturbations are applied to select meteorological and model prognostic fields. The parameters 
used for these perturbations are based on previous work (Reichle et al. [2007]; Kumar et al. 
[2009]) and are listed in Table 2. Zero-mean, normally distributed additive perturbations are 
applied to the downward longwave radiation forcing, and log-normal multiplicative perturbations 
with a mean value of 1 are applied to the precipitation and downward shortwave fields (Table 2). 
Time series correlations are imposed via a first-order regressive model (AR(1)) with a time scale 
of 24 hours. No spatial correlations are applied since this study uses the one-dimensional version 
of the EnKF. Cross correlations are imposed on the perturbations of radiation and precipitation 
fields using the values specified in Table 2. 

In addition to the forcing perturbations, the Noah model prognostic variables for soil moisture 
are perturbed with additive noise that is vertically correlated (Table 2). For the perturbations to 
the model prognostics we impose AR(1) time series correlations with a 12 hour time scale. The 
perturbation settings do not introduce systematic biases in the open loop integrations relative to 
a standard, unperturbed, single-member model integration (not shown). 

A set of preprocessing steps are applied to the synthetic retrievals generated from the Catchment 
LSM integration. To account for difficulties in retrieving soil moisture products from microwave 
sensors, the synthetic observations are masked out when the green vegetation fraction values 
exceed 0.7 and when snow or precipitation are present. Random Gaussian noise with an error 
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standard deviation of 0.03 m 3 m -3 (volumetric soil moisture) is added to the Catchment model 
surface soil moisture values to mimic measurement uncertainties. This error standard deviation 
is chosen as an estimate of the expected error level in surface soil moisture retrievals from 
upcoming space-borne L-band radiometers {Kerr et al. [2010]; Entekhabi et al. [2010b]). 

Five different data assimilation integrations are conducted using these synthetic observations 
(Table 1): (DA-NOSC) Using unsealed observations without any bias correction, (DA-STDN) 
using a priori scaled observations based on standard normal deviate scaling, (DA-CDF) using 
a priori scaled observations based on CDF matching, (DA-OPT1) using unsealed observations 
with a calibrated model, where the model parameters were estimated using a single year of batch 
calibration (year 2000), and (DA-OPT6) using unsealed observations with a calibrated model, 
where model parameters were optimized using all 6 years (2000-2006) of observations. 

The approaches that employ a priori scaling of observations (DA-STDN and DA-CDF) repre- 
sent the commonly followed approaches of correcting biases prior to data assimilation by scaling 
the observations into the model climatology. The DA-CDF experiment follows the strategy of 
Reichle and Koster [2004] and matches the CDF of the observations to that of the model soil 
moisture. First, the observation and model CDFs are computed independently for each grid cell 
using the six year period. Next, the observations are rescaled, separately for each grid cell, such 
that their climatology matches that of the model soil moisture. In theory, this approach corrects 
all moments of the distribution regardless of its shape, although in practice the correction of 
higher order moments is naturally limited by the sample size. While the climatological differ- 
ences between the model and the observations may change with season (Drusch et al. [2005]), 
our experiment DA-CDF is based on CDFs derived with data from all seasons lumped together 
as in Reichle et al. [2007]. The standard normal deviate-based scaling used in the DA-STDN ex- 
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periment is a simpler approach that matches only the first and second moments of the observation 
and model distributions but breaks the scaling down by calendar month to account for possible 
seasonal changes in the climatological differences. This approach is used, for example, by Crow 
et al. [2005]). For a given calendar month k and a given grid cell i, the scaling parameters are the 
multi-year mean (9™ k and 9° k , for model and observations, respectively) and multi-year standard 
deviation (<x" l fc and a° k , for model and observations, respectively). For all observations 0 t from 
this particular calendar month (time subscript omitted), the scaled observations 0\ are then given 
by: 


® i,k 


0 i = 0,” + (0 i -«y-f 


a, 


(i) 


i.k 


In contrast, the calibration-based integrations (DA-OPT1 and DA-OPT6) assimilate raw (un- 
sealed) observations and rely on the calibrated model parameters to mitigate bias in the data 
assimilation system. Note that in the four experiments with bias correction, the information 
from the observation set is employed twice. In DA-STDN and DA-CDF, the observations are 
used once for deriving the climatology and then for assimilation, when the scaled observations 
are assimilated. Similarly in DA-OPT1 and DA-OPT6, the same set of observations is employed 
twice, once for the calibration of the model climatology and then again for the subsequent data 
assimilation. We do not separate the periods of model calibration and data assimilation in ex- 
periments DA-OPT1 and DA-OPT6 in order to provide an equivalent comparison to DA-STDN 
and DA-CDF. 

Note that a priori scaling and model calibration are intended to address the relative bias 
between the model and the observations. The data assimilation system then works with a set 
of observations that are unbiased relative to the model background. In this sense, the synthetic 
experiment used here represents the issues in a “real” data assimilation system. The long-term 
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mean and variability of satellite, in-situ and model soil moisture estimates differ from each 
other due to representativeness differences (horizontal and vertical), limited sensor calibration, 
retrieval model assumptions and model deficiencies, implying that, in a climatalogical sense, 
none of the datasets is necessarily more correct than any other ( Reichle and Koster [2004]; 
Reichle et al. [2007]). Consequently, our use of the “truth” label for the synthetic observations 
does not necessarily imply that satellite-based retrievals are unbiased. 

3.3. Optimization formulation for parameter estimation 

In experiments DA-NOSC, DA-STDN, and DA-CDF we use the Noah LSM with its native 
parameters that are mostly based on look up tables (as functions of vegetation and soil categories), 
the same parameters that are used in the operational environments at the National Centers for 
Environmental Prediction (NCEP) and the Air Force Weather Agency (AFWA). For experiments 
DA-OPT1 and DA-OPT6, by contrast, we estimate spatially distributed representations of Noah 
model parameters through GA optimization (section 4.1). 

Table 3 lists the parameters included in the decision space in the optimization simulations based 
on Hogue et al. [2005]. The decision space includes a number vegetation and soil properties 
along with the initial soil moisture states. The initial set of potential solutions in GA is generated 
by randomly sampling from the range of each parameter as specified in Table 3. A population 
size of 50 is used in the GA simulations. 

The objective function at each grid point is defined as the inverse of absolute difference 
in the mean soil moisture values of the observation and the model (Equation 2), where J, is 
the fitness value for grid cell i, 6° and (9”' are the the mean soil moisture values from the 
observations (from Catchment LSM), and simulated from Noah model, respectively, for grid 
cell i. The mean soil moisture values 6° and 6] n are computed at each grid point i by averaging 
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the available soil moisture values over the course of the model simulation. The denominator of 
the objective function thus represents the absolute soil moisture climatology difference between 
the observations and the model. 


■h = 


M-9 . r 


( 2 ) 


This objective function is maximized independently for each grid cell i. The optimization 
explores the decision space to maximize the fitness function values, subject to the the allowed 
range of values for each parameter (Table 3). 

The GA integrations use an elitism strategy to ensure that the current best solution is not 
overwritten during GA evolution. A mutation rate of 0.005 and a recombination rate of 0.9 was 
employed. The algorithm was found to converge after approximately 200 generations, when 
the fitness of the best solution was found not to improve in the last 30 generations. These GA 
parameters (including the mutation and recombination rates) are chosen largely from experience 
and the success of the optimization simulations presented in Section 4.1 suggest that they are 
reasonable. 


4. Results 

The results presented in this section focus first on the optimization simulations, that is, the 
model calibration conducted prior to the DA-OPT1 and DA-OPT6 assimilation integrations. 
Following this discussion, the different bias mitigation strategies are evaluated within the context 
of soil moisture data assimilation. 

4.1. Optimization simulations 
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Two separate optimization simulations are conducted: (1) using a single year of observational 
data (OPT1; observations from year 2000) and (2) using observations from all six years (OPT6; 
years 2000 - 2006). First, we compare the Noah model integrations using these two sets of 
LSM parameters with the open loop simulation that employs the default values from the look 
up table. Figure 3 presents maps of time series mean (climatological) differences in surface 
soil moisture (which is essentially the inverse of the objective function used in the optimization 
simulations). As discussed in section 3.3, the maps are computed by subtracting the mean Noah 
LSM soil moisture values for each of the integrations shown in the figure from the corresponding 
mean Catchment LSM surface soil moisture estimates. In computing these mean fields, we only 
include the times and locations for which (synthetic) observations are available (section 3.2). 
Further, only grid points with at least 600 observations for the evaluation period are considered 
in the analysis of the results. 

Figure 3 demonstrates that using the optimized parameters leads to reducing the systematic 
differences in climatologies between the model and observations, throughout the domain. These 
maps indicate that the Noah open loop integration generates on average (but not uniformly) drier 
soil moisture values compared to the Catchment LSM. The use of optimized parameters helps 
to correct the bias. Both OPT1 and OPT6 integrations improve this systematic underestimation 
in the open loop by providing closer matches to the Catchment (“truth”) estimates, as seen in 
the bottom two panels of Figure 3. The domain averaged soil moisture climatology difference 
is reduced from 0.034 m 3 m -3 (for OL) to 0.006 m 3 m -3 for OPT1 and to -0.003 m 3 m -3 for 
OPT6. If absolute values of climatology differences are used, the improvements from OPT1 and 
OPT6 are even more pronounced; the domain averaged absolute difference reduces from 0.047 
m 3 m -3 for OL to 0.010 m 3 m -3 for OPT1 and 0.009 m 3 m -3 for OPT6. The estimation of model 
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parameters thus enables the correction of systematic biases and leads to a closer match between 
the soil moisture climatologies of the model (Noah) and the synthetic observations (Catchment). 

Figure 4 shows maps of the parameters used in the open loop integration (prescribed using 
look up tables) and the calibrated values from the OPT6 integration. Out of the parameters listed 
in Table 3 we focus on three key parameters: porosity (9 S ), saturated matric potential (ip s ) and 
saturated hydraulic conductivity ( K s ). The spatial patterns in the look up table-based parameters 
are similar to each other, because they are determined based on the soil texture map. In contrast, 
the optimized parameters show more spatial variability, because they are not constrained to soil 
types or vegetation categories. Compared to the default parameters, the optimized parameters 
in general show higher values of 9 S , ip s and K s over the domain. This is consistent with the 
optimization objective of correcting the dry bias in the open loop integration, as higher values 
of 9 S , ip s and K s would allow for more water to be held in the soil and more infiltration into the 
soil, and correspondingly higher soil moisture values. Similar spatial trends are also observed 
in other parameters (not shown). 

Although these spatial trends are consistent with the patterns in soil moisture simulations, the 
intent here is not to judge the veracity or physical realism of the estimated parameters. Instead, 
our goal is to study how bias mitigation through parameter estimation helps in the subsequent 
data assimilation performance. Though the typical approach in land surface models is to employ 
look up table-based parameters that are derived from limited data samples (e.g. Rawls et al. 
[1982]; Cosby et al. [1984]), these representations suffer from numerous issues, including lack 
of spatial representativeness of the datasets on which they are based, errors in extrapolating the 
point-scale to the modeling scales, and the large within-soil class variation of properties that is 
on par with the variation across different texture classes ( Schaap [2004]; Braun and Schadler 
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[2005]; Doherty and Welter [2010]; Gutman and Small [2010]). Further, the physical realism and 
mismatch issues of the parameters are difficult to assess at large spatial scales because validating 
in situ measurements of surface and root zone soil moisture that match the scale of the model 
grid cells are not available. 

In short, there is significant uncertainty associated with the default parameters, typically re- 
garded as the “truth”. The optimization formulation in this article samples from the ranges of 
parameters (Table 3) representing the full spectrum across all look up table categories. Additional 
look up table category -based constraints can be introduced on these parameter ranges to ensure 
that the estimated parameters conform to the traditional, category-based (e.g. soil texture-based) 
notions of physical realism. Algorithms and approaches that incorporate notions of “equifinal” 
solutions (e.g., Gupta et al. [1999]; Hogue et al. [2006]) may offer more effective ways to rep- 
resent parameter uncertainty and to ensure physical consistency since they generate a range of 
plausible model fits. The use of such methods is left for a future work. Here, the parameter 
sets generated by the optimization simulations OPT1 and OPT6 may represent mismatches with 
regard to the typical category-based definitions. 

4.2. Data assimilation experiments 

This section presents the results from data assimilation experiments that employ different 
strategies for bias correction (section 3.2). Since the suite of experiments include simulations 
that assimilate both unsealed (experiments DA-NOSC, DA-OPT1 and DA-OPT6) and scaled 
observations (experiments DA-STDN and DA-CDF), we primarily use the anomaly time series 
correlation coefficient (R), to quantify the skill of the model simulations. 

The anomaly time series for each grid point is estimated as follows: The monthly-mean clima- 
tology values are subtracted from the daily average raw data, so that the anomalies represent the 
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daily deviations from the mean seasonal cycle. The skill contribution from correctly identifying 
the mean seasonal variation is therefore excluded. The anomaly R values are computed, sepa- 
rately for each grid point, as the correlation coefficients between the daily anomalies from the 
assimilation estimates and the corresponding truth data. Only anomalies at times and locations 
for which observations are assimilated contribute to the computation of the R values. Similar 
to the comparisons in Section 4.1, only grid points with at least 600 assimilated observations 
during the evaluation period are included in the evaluations. 

Figure 5 shows the comparison of the anomaly R values for surface soil moisture from different 
model integrations. Overall, the assimilation experiments perform better than the open loop 
simulation, and the assimilation skill systematically improves from experiment DA-NOSC to 
experiment DA-OPT6. The domain averaged skill of the Noah model integration without any 
data assimilation (OL) is 0.47. When observations are assimilated without bias correction (DA- 
NOSC), the domain averaged skill improves to 0.63. The assimilation skill is further improved 
in the integrations that employ a priori scaling of observations, with domain averaged skill values 
of 0.71 and 0.73, for DA-STDN and DA-CDF, respectively. For the climatological differences 
encountered in this synthetic experiment, the use of higher-order moments in the CDF matching 
technique slightly outperforms the seasonally varying scaling parameters used in DA-STDN. 
Finally, surface soil moisture skill values of 0.73 and 0.75 are obtained for experiments DA- 
OPT1 and DA-OPT6, respectively, when assimilation integrations are conducted with optimized 
parameters that conform to the Catchment LSM (truth) climatology. 

The assimilation of surface soil moisture retrievals is often used as a way to generate superior 
estimates of related states such as root zone soil moisture ( Reichle et al. [2007]; Kumar et al. 
[2009]). Figure 6 presents a comparison of the root zone soil moisture skill estimates from 
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different model integrations. Similar to the behavior observed for surface soil moisture, the 
skill of root zone estimates from using the calibrated model is comparable to the skills from 
a priori scaling approaches. The domain averaged open loop root zone skill estimate is 0.45 
and it improves to 0.54 when assimilation is performed without bias correction (DA-NOSC). 
The skill further improves to 0.62 and 0.63, through the use of a priori scaling of observations, 
for integrations DA-STDN and DA-CDF, respectively. Finally, the use of a calibrated model 
together with the assimilation of unsealed observations provides domain averaged skill values 
of 0.62 and 0.63, for integrations DA-OPT1 and DA-OPT6, respectively. For root zone soil 
moisture, the relative advantage of the a priori calibration strategy (DA-OPT1, DA-OPT6) over 
the a priori scaling methods (DA-STDN, DA-CDF) is minimal. The 95% confidence intervals 
of the domain averaged anomaly R values are in the range of 0.008 to 0.01, verifying that the 
improvements obtained through data assimilation in both surface and root zone soil moisture are 
statistically significant. 

In a separate analysis (not shown), we also examined the skill improvements in surface fluxes 
(latent, sensible and ground heat) from the data assimilation integrations. The assimilation runs 
with bias correction (DA-STDN, DA-CDF, DA-OPT1, and DA-OPT6) were found to marginally 
improve the surface flux skill values over the open loop and DA-NOSC integrations, with a priori 
scaling and a priori calibration yielding comparable results. 

Figures 5 and 6 also indicate that soil moisture skill values improve consistently across the 
domain in the data assimilation integrations. To further illustrate this fact, Figure 7 shows 
probability density functions (PDFs) for surface and root zone soil moisture skill values across the 
modeling domain. Compared to the PDF for the OL integration, the PDFs from data assimilation 
integrations show narrower distributions that are skewed towards higher skill values, due to 
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the improved soil moisture estimates from assimilation. For surface soil moisture, the PDF 
for DA-NOSC is shifted towards higher R values, but shows only a marginal reduction in the 
spread compared to the PDF for OL skill (The standard deviation of the PDF reduces from 
0.156 to 0.142). The runs based on a priori scaling (DA-STDN and DA-CDF) yield a greater 
reduction in the OL spread (standard deviation of 0.121 and 0.093, respectively) and a further 
shift towards higher skill values. The DA-OPT1 and DA-OPT6 integrations provide similarly 
reduced variability in skill estimates (that is, consistent improvements) across the domain with 
standard deviations in PDFs of 0.113 and 0.091, respectively). Comparable but more muted 
trends are observed for root zone soil moisture, where the variability in skill values also reduces, 
gradually from the OL to DA-OPT6. In summary, Figure 7 indicates that a priori calibration and 
a priori scaling yield comparable improvements in surface and root zone skill. 

The anomaly R metric is indifferent to any bias in the mean or the amplitude of variations. By 
contrast, the RMSE is highly sensitive to biases. As mentioned earlier, the long-term mean bias 
with respect to the true conditions is difficult (if not impossible) to determine for continental-scale 
soil moisture. To supplement the anomaly R skill values presented above, we now assess the 
“unbiased” RMSE (ubRMSE) values, which are computed from the time series after removal of 
the long-term mean bias ( Entekhabi et al. [2010a]). Table 4 provides a comparison of the domain 
averaged ubRMSE values from different model simulations, which shows similar trends to those 
seen with the anomaly R metric. For surface soil moisture, the domain-averaged ubRMSE 
for the OL integration is 0.052 m 3 m -3 , which reduces to 0.041 m 3 m -3 for DA-NOSC. The 
scaling-based DA runs DA-STDN and DA-CDF improve these estimates to 0.038 m 3 m -3 and 
0.037 m 3 m -3 , respectively. The optimization-based runs DA-OPT1 and DA-OPT6 provide 
comparable skills to those the scaling-based runs with domain averaged ubRMSE values of 


DRAFT 


September 30, 2011, 2:24pm 


DRAFT 



X - 24 


KUMAR ET AL.: BIAS CORRECTION IN SOIL MOISTURE DATA ASSIMILATION 


441 


442 


443 


444 


445 


446 


447 


448 


449 


450 


451 


452 


453 


454 


455 


456 


457 


458 


459 


460 


461 


462 


0.037 and 0.036 m 3 m -3 , respectively. The root zone soil moisture skill values follow similar 
trends. The domain averaged ubRMSE for OL is 0.039 m 3 m -3 , and it improves to 0.037 
m 3 m -3 in the DA-NOSC simulation. Both a priori scaling and optimization based approaches 
provide systematic, statistically significant improvements (relative to OL) with domain-averaged 
ubRMSE of 0.035, 0.034, 0.033 and 0.033 m 3 m- 3 , for integrations DA-STDN, DA-CDF, DA- 
OPT1, and DA-OPT6, respectively. 

An important aspect of a priori bias mitigation approaches is the fact that they require an a priori 
estimate of the climatology of the observations. Reichle and Koster [2004] demonstrate that for 
the a priori scaling approach, a single year of observations may be sufficient if some spatial 
averaging over neighboring grid cells is employed to reduce sampling noise. In this context, it 
is encouraging that the assimilation skill values from the DA-OPT1 and DA-OPT6 integrations 
are comparable, with DA-OPT6 generating an additional domain averaged improvement of only 
0.02 over DA-OPT1 for surface and root zone soil moisture. In other words, most of the benefit 
of the a priori calibration method can be achieved with just one year’s worth of observations, 
provided the climatology can be reasonably approximated from the available data year, which is 
the case here (not shown). This suggests that using a short time period for calibration can still 
be an effective strategy, which is especially important for new types of satellite missions when 
the period of available data is relatively short. 

Further, note that the objective function formulation (equation 2) is designed to only correct the 
first moment of the model and observation distributions, whereas the a priori scaling approaches 
are designed to correct multiple moments of the distributions. Nevertheless, the assimilation 
skills from the a priori scaling and a priori optimization approaches are already comparable. 
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indicating that further skill improvements may be achieved using objective function formulations 
designed to correct multiple moments of the distributions. 

4.3. Computational considerations 

Data assimilation with bias mitigation through a priori calibration (DA-OPT1, DA-OPT6) 
improves surface and root zone soil moisture estimates compared to bias mitigation through 
a priori scaling (DA-STDN, DA-CDF). It should be noted, however, that the estimation of 
the optimization parameters through batch calibration has an associated computational cost. 
The scalable computing infrastructure in LIS helps in reducing this overhead through parallel 
computation using multiple processors. The OPT6 integration requires 200 iterations of LIS 
runs over the 2000-2006 period, which translates to wall clock times of approximately a week, 
using 128 processors. In comparison, the OPT1 integration requires approximately a day (using 
128 processors). The comparable skill of the short calibration-based run (DA-OPT1) relative to 
the long calibration-based run (DA-OPT6) indicate that the high computational cost associated 
with batch calibration can be considerably reduced by using a shorter time period of observations 
that adequately represents the overall climatology. The dimensionality of the decision space can 
be reduced by selecting a smaller number of parameters that are likely to be more sensitive to 
the soil moisture simulations. The reduction in the dimensionality of the decision space vector 
will also aid towards reducing the computational cost associated with optimization simulations. 

4.4. Innovation metrics 

In this section, we examine the filter innovations (observation minus model forecast residuals) 
from the assimilation experiments. This analysis provides insights into the performance of the 
data assimilation integrations ( Reichle et al. [2002]; Crow and Van Loon [2006]; Reichle et al. 
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[2007]; Kumar et al. [2008b]). Strictly speaking, the EnKF provides optimal estimates only if 
several assumptions hold, including linear system dynamics with model and observation errors 
that are Gaussian and mutually and serially uncorrelated. If these assumptions hold, then the 
distribution of normalized innovations (normalized with their expected covariance) follows a 
standard normal distribution, N( 0, 1) (Gelb [1974]). The deviations from the expected mean 
and variance of the normalized innovation distribution provides a measure of the degree of 
suboptimality with which the assimilation system performs. 

Unsurprisingly, the integration without a priori bias mitigation exhibits the largest innovation 
biases, reflecting strong biases between the (synthetic) observations and the corresponding model 
forecasts (not shown). The a priori scaling (DA-STDN, DA-CDF) and a priori calibration 
approaches (DA-OPT1, DA-OPT6) clearly mitigate theses biases (not shown). Figure 8 presents 
maps of the variance of the normalized innovations. For the bias-blind assimilation integration 
(DA-NOSC), the variance of the normalized innovations is on average 2.38 and far exceeds the 
target value of 1 , which reflects the strong underestimation of the actual errors by the assimilation 
system because it ignores the bias. Adding a priori bias mitigation strategies brings the variance 
of the normalized innovations much closer to the target value of 1. Based on this metric, the 
assimilation using the CDF-based a priori scaling (DA-CDF) operates closer to optimality than 
the simpler strategy that uses only the first and second order rescaling (DA-STDN). Likewise, 
variance of the normalized innovations is closer to the target value of 1 when all years are used 
in the a priori calibration (DA-OPT6) rather than just one year (DA-OPT1). 

5. Summary 

Data assimilation methods such as the EnKF require that the errors in the model and the ob- 
servations are strictly random. As a result, the presence of systematic or bias errors needs to be 
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addressed separately within the data assimilation system. In this study, we evaluate a number of 
bias mitigation strategies in the context of assimilating surface soil moisture retrievals. Specifi- 
cally, we examine the use of land model parameter estimation as a bias correction strategy prior 
to data assimilation. This strategy is compared to the approach of scaling the assimilated obser- 
vations to the land model’s climatology prior to data assimilation. The study is conducted using 
a fraternal twin experiment setup, where synthetic observations generated using the Catchment 
LSM are assimilated into the Noah LSM. Five different data assimilation experiments are con- 
ducted, each using a different strategy to correct (or not) for bias prior to data assimilation. The 
resulting soil moisture estimates are evaluated against the corresponding synthetic truth fields 
from the Catchment LSM. 

Our results indicate that a priori land model calibration is an effective strategy for bias mitiga- 
tion in soil moisture assimilation. The domain averaged skill estimates (in terms of anomaly R 
values) for the Noah open loop simulation without any data assimilation are 0.47 for surface soil 
moisture and 0.45 for root zone soil moisture. These skill estimates improve to 0.63 for surface 
soil moisture and 0.54 for root zone soil moisture, when assimilation is conducted without any 
bias correction (DA-NOSC). When observations are assimilated after rescaling to the model 
climatology, the assimilation skill improves further. Two approaches for a priori scaling are con- 
sidered: (DA-STDN) using standard normal deviates and (DA-CDF) by matching the CDFs of 
the observations to that of the model. Assimilation using these a priori scaling approaches yields 
domain averaged skill values of 0.71 and 0.73 for surface soil moisture and 0.62 and 0.63 for root 
zone soil moisture, respectively. Similar improvements in the surface and root zone soil moisture 
estimates are observed with the assimilation runs that employ optimized model parameters but 
ingest unsealed observations. Two sets of optimized parameters are used in the experiments: 
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(DA-OPT1) parameters estimated from a single year of calibration and (DA-OPT6) parameters 
estimated from six years of calibration. When data assimilation is conducted using parameters 
from a single year of calibration, skill estimates of 0.73 for surface soil moisture and 0.62 for 
root zone soil moisture are obtained. The use of the six-year based parameters further improves 
these skill measures to 0.75 for surface soil moisture and 0.63 for root zone soil moisture. 

It was also observed that spatial variability in the skill scores across the domain is reduced 
with the use of optimized parameters, resulting in more spatially consistent skill enhancements. 
The skill improvements in surface fluxes were found to be comparable for data assimilation 
following a priori scaling and a priori calibration. Similar trends in skill scores are also observed 
if the unbiased RMSE metric is used instead of anomaly R for evaluating the results. Finally, 
the analysis of innovation diagnostics also demonstrates that without the use of suitable bias 
correction, the assimilation system performs in a less than optimal manner and that all four bias 
mitigation strategies adequately address the bias issue. 

In the suite of synthetic experiments presented in this article we are in effect calibrating the 
Noah surface soil moisture climatology to that of the Catchment LSM. It must be stressed that 
this approach is chosen not because one model (Catchment) is more correct than the other (Noah). 
A similar argument holds when satellite soil moisture retrievals are assimilated. In that case, the 
climatology of the retrievals is not necessarily more correct than that of the model. However, 
when brightness temperatures are assimilated in radiance space instead of the retrievals, the model 
should be calibrated to the observed brightness temperature climatology. The long-term biases 
can be mitigated through calibration and the remaining shorter-term biases can be addressed 
with a priori scaling. The combined use of these strategies will be examined in future radiance 
based data assimilation experiments. 


DRAFT 


September 30, 2011, 2:24pm 


DRAFT 



KUMAR ET AL.: BIAS CORRECTION IN SOIL MOISTURE DATA ASSIMILATION 


X - 29 


551 


552 


553 


554 


555 


556 


557 


558 


559 


560 


561 


562 


563 


564 


565 


566 


567 


568 


569 


570 


571 


572 


573 


Though effective, the approach of using parameter estimation for bias correction also suffers 
from the limitations of the a priori scaling approaches. Since the parameters are estimated in 
advance of data assimilation, any subsequent changes in model behavior will not be captured, 
unlike in the dynamic bias estimation algorithms. The optimization formulation does not con- 
strain the estimated parameters to conform to the traditional, look up table-based definitions of 
parameters. Here, no attempt was made to ensure the physical realism of the estimated param- 
eters. The calibration might also require additional constraints to ensure that the behavior of 
related variables is not adversely affected. Note, however, that we have found that the estimates 
of the latent and sensible heat fluxes were comparable for the assimilation integrations with bias 
correction (DA-STDN, DA-CDF, DA-OPT1, and DA-OPT6). Furthermore, our results suggest 
that using model parameter estimation could be a viable strategy for bias mitigation in cases of 
relatively short (i.e., one year) satellite records. This result is important for expediting the use 
of soil moisture retrievals becoming available from SMOS and SMAP. 

The study also demonstrates the advanced capabilities of the NASA LIS framework, including 
the development of a new subsystem for optimization. This extension encapsulates a range of 
advanced search algorithms suited for both convex and non-convex optimization problems. In 
this particular study, the Genetic Algorithm, a heuristic search technique based on principles 
of evolutionary computing, is employed for estimating model parameters. The optimization 
infrastructure within LIS is currently being enhanced with a suite of uncertainty estimation algo- 
rithms based on Bayesian methods. In contrast to the optimization techniques that have already 
been implemented in LIS and generate a single solution for parameters, the newer uncertainty 
estimation tools infer distributions of parameters based on the observational information. These 
parameter distributions can then be used to condition the ensembles used in the data assimilation 
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system. The joint use of optimization and data assimilation tools presented here and future 
LIS advancements will enable the increased exploitation of observational data for improving 
hydrological modeling. 
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Table 1 . Overview of model and assimilation integrations 


OL 

Noah model integration without assimilation (Open Loop) 

OPT1 

Noah model integration without assimilation and with 
model parameters optimized to reproduce one-year (2000) 
climatology of synthetic soil moisture observations 

OPT6 

Noah model integration without assimilation and with 
model parameters optimized to reproduce six-year (2000-2006) 
climatology of synthetic soil moisture observations 

DA-NOSC 

Noah assimilation integration without bias correction 
using unsealed observations 

DA-STDN 

Noah assimilation integration using a priori scaling 
of observations based on standard normal deviates 

DA-CDF 

Noah assimilation integration using a priori scaling 
of observations based on CDF matching 

DA-OPT1 

Noah assimilation integration using OPT1 model parameters 
and unsealed observations 

DA-OPT6 

Noah assimilation integration using OPT6 model parameters 
and unsealed observations 
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Table 2. Parameters for perturbations to meteorological forcings and model prognostic variables in 
the EnKF assimilation experiments 


Variable 

Perturbation Type 

Standard Deviation 

Cross Correlations 
with perturbations in 

Meteorological Forcings 


SWj 

LWj 

PCP 


Downward Shortwave (SWj) 

Multiplicative 

0.3 [-] 

1.0 

-0.5 

-0.8 


Downward Longwave (LWj) 

Additive 

50 W/m 2 

-0.5 

1.0 

0.5 


Precipitation (PCP) 

Multiplicative 

0.50 [-] 

-0.8 

0.5 

1.0 


Noah LSM soil moisture states 


sml 

sm2 

sm3 

sm4 

Total soil moisture - layer 1 (sml) 

Additive 

6.0E-3 m 3 m -3 

1.0 

0.6 

0.4 

0.2 

Total soil moisture - layer 2 (sm2) 

Additive 

1.1 E-4 m 3 m -3 

0.6 

1.0 

0.6 

0.4 

Total soil moisture - layer 3 (sm3) 

Additive 

0.60E-5 m 3 m -3 

0.4 

0.6 

1.0 

0.6 

Total soil moisture - layer 4 (sm4) 

Additive 

0.40E-5 m 3 m -3 

0.2 

0.4 

0.6 

1.0 
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Table 3. List of Noah LSM parameters used in the optimization runs. The columns show the variable 
names, a brief description and the range of values (maximum and minimum values) of the parameters 
used in the optimization system. 


No. 

Variable 

Description 

Min value 

Max value 

1 

smcmax 

Porosity (-) 

0.30 

0.55 

2 

psisat 

Saturated matric potential (-) 

0.01 

0.70 

3 

dksat 

Saturated hydraulic conductivity (m/s) 

0.05E-5 

3.00E-5 

4 

dwsat 

Saturated soil diffusivity (-) 

5.71E-6 

2.33E-5 

5 

bexp 

The “b” parameter (-) 

3.0 

9.0 

6 

quartz 

Soil quartz content (-) 

0.10 

0.90 

7 

rsmin 

Minimum stomatal resistance (m) 

40 

1000 

8 

rgl 

Parameter used in solar radiation 





term of canopy resistance (-) 

30 

150 

9 

hs 

Parameter used in vapor pressure deficit 





term of canopy resistance (-) 

36.35 

55 

10 

zO 

Roughness length (m) 

0.01 

0.99 

11 

lai 

Leaf area index (-) 

0.05 

6.00 

12 

cfactr 

Canopy water parameter 

0.1 

2.0 

13 

cmcmax 

Canopy water parameter (m) 

IE-4 

2E-3 

14 

sbeta 

Parameter used in the computation of 





vegetation effect on soil heat flux (-) 

-4 

-1 

15 

rsmax 

Maximum stomatal resistance (m) 

2000 

10000 

16 

topt 

Optimum transpiration air temperature (K) 

293 

303 

17 

refdk 

Reference value for saturated hydraulic conductivity (m/s) 

5E-7 

3E-5 

18 

fxexp 

Bare soil evaporation exponent (-) 

0.2 

4.0 

19 

refkdt 

Reference value for surface infiltration parameter (-) 

0.1 

10.0 

20 

czil 

Parameter used in the calculation of roughness length of heat (-) 

0.05 

0.8 

21 

csoil 

Soil heat capacity for mineral soil component (-) 

1.26E6 

3.5E6 

22 

frzk 

Ice threshold (-) 

0.10 

0.25 

23 

snup 

Snow depth threshold that implies 100% snow cover (m) 

0.02 

0.08 

24 

sh2ol 

Initial liquid soil moisture for soil layer 1 (m 3 m -3 ) 

0.05 

0.50 

25 

sh2o2 

Initial liquid soil moisture for soil layer 2 (m 3 m -3 ) 

0.05 

0.50 

26 

sh2o3 

Initial liquid soil moisture for soil layer 3 (m 3 m -3 ) 

0.05 

0.50 

27 

sh2o4 

Initial liquid soil moisture for soil layer 4 (m 3 m -3 ) 

0.05 

0.50 

28 

smcl 

Initial total soil moisture for soil layer 1 (m 3 m -3 ) 

0.05 

0.50 

29 

smc2 

Initial total soil moisture for soil layer 2 (m 3 m -3 ) 

0.05 

0.50 

30 

smc3 

Initial total soil moisture for soil layer 3 (m 3 m -3 ) 

0.05 

0.50 

31 

smc4 

Initial total soil moisture for soil layer 4 (m 3 m -3 ) 

0.05 

0.50 
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Table 4. Comparison of domain averaged unbiased RMSE (ubRMSE) metric values from different 
model integrations (all with the 95% confidence intervals). 


Experiment 

Surface soil 
moisture (m 3 m -3 ) 

Root zone soil 
moisture (m 3 m -3 ) 

OL 

0.052 ± 0.001 

0.039 ± 0.001 

DA-NOSC 

0.041 ± 0.001 

0.037 ± 0.001 

DA-STDN 

0.038 ± 0.001 

0.035 ± 0.001 

DA-CDF 

0.037 ± 0.001 

0.034 ± 0.001 

DA-OPT1 

0.037 ± 0.001 

0.033 ± 0.001 

DA-OPT6 

0.036 ± 0.001 

0.033 ± 0.001 


DRAFT 


September 30 


2011 


2 : 2 4pm 


DRAFT 



KUMAR ET AL.: BIAS CORRECTION IN SOIL MOISTURE DATA ASSIMILATION 


X - 43 



Figure 1 . Optimization abstractions in LIS: (1) objective function, (2) decision/parameter space, and 
(3) optimization algorithm (LM - Levenberg-Marquardt, GA - Genetic Algorithm, SCE-UA - Shuffled 
Complex Evolution from University of Arizona). Dotted lines represent interconnections between the 
optimization abstractions enabled by the LIS core. Black boxes represent data exchanges between the 
three components through ESMF objects. 
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Figure 2. Sequence of GA operations. An example of the population evolution is shown on the right, 
with a population size of 10 potential solutions (si, s2, slO). The grey bars indicate the fitness values 
of the individual solutions. An example of the selection step shows the choice of s7 after comparing s2 
and s7. After the selection step, the GA operations of recombination, mutation and elitism are conducted 
and a new population of solutions are generated. The algorithm continues until the convergence criteria 
are met. 
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Figure 3. Comparison of the surface soil moisture climatology difference fields between the Catchment 
LSMtruthand(a)OL(b)OPTl, and (c)OPT6 (see Table 1). The gray color represents grid cells excluded 
from the computations. Titles indicate domain averaged values. The units are m 3 m -3 
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Figure 4. (Top) porosity (9 S , unitless), (middle) saturated matric potential (^ 5 , unitless) and (bottom) 
saturated hydraulic conductivity ( K s , in units of m/s) from (left column) look up tables and (right column) 
estimated through optimization OPT6. The gray color represents grid cells for which parameters were 
not estimated. 


DRAFT 


September 30, 2011, 2:24pm 


DRAFT 




KUMAR ET AL.: BIAS CORRECTION IN SOIL MOISTURE DATA ASSIMILATION 


X - 47 



0 - 0.1 0.1 0.2 0.3 0.4 0.5 0.6 0,7 0,8 0.9 1 

Figure 5. Surface soil moisture skill in terms of anomaly time series correlation coefficients. See table 1 
for definition of experiments. The gray color represents grid cells excluded from the computations. Titles 
show domain averaged values. 
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Figure 6. Same as Figure 5, but for root zone soil moisture. 
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Figure 7. PDFs of skill (anomaly R) values across the domain from different model integrations for 


(top) surface soil moisture and (bottom) root zone soil moisture. 
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Figure 8. Variance of normalized innovations from different assimilation experiments. The gray color 
represents grid cells excluded from the computations. The titles indicate domain averaged values. 
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